首页> 外文OA文献 >Self-supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation
【2h】

Self-supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation

机译:基于广义计算的自监督深度强化学习   机器人导航图

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。
获取外文期刊封面目录资料

摘要

Enabling robots to autonomously navigate complex environments is essentialfor real-world deployment. Prior methods approach this problem by having therobot maintain an internal map of the world, and then use a localization andplanning method to navigate through the internal map. However, these approachesoften include a variety of assumptions, are computationally intensive, and donot learn from failures. In contrast, learning-based methods improve as therobot acts in the environment, but are difficult to deploy in the real-worlddue to their high sample complexity. To address the need to learn complexpolicies with few samples, we propose a generalized computation graph thatsubsumes value-based model-free methods and model-based methods, with specificinstantiations interpolating between model-free and model-based. We theninstantiate this graph to form a navigation model that learns from raw imagesand is sample efficient. Our simulated car experiments explore the designdecisions of our navigation model, and show our approach outperformssingle-step and $N$-step double Q-learning. We also evaluate our approach on areal-world RC car and show it can learn to navigate through a complex indoorenvironment with a few hours of fully autonomous, self-supervised training.Videos of the experiments and code can be found at github.com/gkahn13/gcg
机译:使机器人能够自主导航复杂的环境对于实际部署至关重要。现有的方法通过使机器人维护世界的内部地图来解决此问题,然后使用定位和计划方法在内部地图中导航。但是,这些方法通常包括各种假设,计算量大,并且不能从失败中学习。相比之下,基于学习的方法会随着机器人在环境中的行为而提高,但由于其样本复杂性高而难以在现实世界中部署。为了解决使用少量样本学习复杂策略的需求,我们提出了一个通用计算图,该图包含基于值的无模型方法和基于模型的方法,并在无模型和基于模型之间进行插值。然后我们实例化该图以形成一个导航模型,该模型可以从原始图像中学习,并且采样效率高。我们的模拟汽车实验探索了导航模型的设计决策,并显示​​了我们的方法优于单步和$ N $步的双Q学习。我们还评估了我们在区域世界RC汽车上的方法,并表明它可以通过几个小时的完全自主,自我监督的训练来学习如何在复杂的室内环境中导航。实验视频和代码可在github.com/gkahn13上找到/ gcg

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号